30 research outputs found

    Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction

    Get PDF
    Frame-level visual features are generally aggregated in time with the techniques such as LSTM, Fisher Vectors, NetVLAD etc. to produce a robust video-level representation. We here introduce a learnable aggregation technique whose primary objective is to retain short-time temporal structure between frame-level features and their spatial interdependencies in the representation. Also, it can be easily adapted to the cases where there have very scarce training samples. We evaluate the method on a real-fake expression prediction dataset to demonstrate its superiority. Our method obtains 65% score on the test dataset in the official MAP evaluation and there is only one misclassified decision with the best reported result in the Chalearn Challenge (i.e. 66:7%) . Lastly, we believe that this method can be extended to different problems such as action/event recognition in future.Comment: Submitted to International Conference on Computer Vision Workshop

    A simple and effective mechanism for stored video streaming with TCP transport and server-side adaptive frame discard

    Get PDF
    Cataloged from PDF version of article.Transmission control protocol (TCP) with its well-established congestion control mechanism is the prevailing transport layer protocol for non-real time data in current Internet Protocol (IP) networks. It would be desirable to transmit any type of multimedia data using TCP in order to take advantage of the extensive operational experience behind TCP in the Internet. However, some features of TCP including retransmissions and variations in throughput and delay, although not catastrophic for non-real time data, may result in inefficiencies for video streaming applications. In this paper, we propose an architecture which consists of an input buffer at the server side, coupled with the congestion control mechanism of TCP at the transport layer, for efficiently streaming stored video in the best-effort Internet. The proposed buffer management scheme selectively discards low priority frames from its head-end, which otherwise would jeopardize the successful playout of high priority frames. Moreover, the proposed discarding policy is adaptive to changes in the bandwidth available to the video stream. 2004 Elsevier B.V. All rights reserved

    AN ABSTRACTION BASED REDUCED REFERENCE DEPTH PERCEPTION METRIC FOR 3D VIDEO

    No full text
    19th IEEE International Conference on Image Processing (ICIP) -- SEP 30-OCT 03, 2012 -- Lake Buena Vista, FLNUR YILMAZ, Gokce/0000-0002-0015-9519; B. Akar, Gozde/0000-0002-4227-5606WOS: 000319334900152In order to speed up the wide-spread proliferation of the 3D video technologies (e.g., coding, transmission, display, etc), the effect of these technologies on 3D perception should be efficiently and reliably investigated. Using Full-Reference (FR) objective metrics for this investigation is not practical especially for "on the fly" 3D perception evaluation. Thus, a Reduced Reference (RR) metric is proposed to predict the depth perception of 3D video in this paper. The color-plus-depth 3D video representation is exploited for the proposed metric. Since the significant depth levels of the depth map sequences have great influence on the depth perception of users, they are considered as side information in the proposed RR metric. To determine the significant depth levels, the depth map sequences are abstracted using bilateral filter. Video Quality Metric (VQM) is utilized to predict the depth perception ensured by the significant depth levels due to its well correlation with the Human Visual System (HVS). The performance assessment results present that the proposed RR metric can be utilized in place of a FR metric to reliably measure the depth perception of 3D video with a low overhead.Inst Elect & Elect Engineers (IEEE), IEEE Signal Proc So
    corecore